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3.9 A likelihood ratio test for nested composite hypotheses: Wilks's theorem. 

Let G be a (i- dimensional parameter space, specifically, an open set in M'^. Let Hq be a 
fc-dimensional subset of G, in a sense to be made more precise below, for some k < d. 
For example, Hq could be the intersection with G of a /c-dimensional flat hyperplane. Let 
{Pg, 6* e G} be an equivalent family of laws on a sample space {X,B) with a likelihood 
function f{d, x) > for aU ^ e G and x e X. 

Assume that observations Xi, . . . , X^ are i.i.d. Pg for some 6* G G. We want to test the 
hypothesis that 9 G Hq. S. S. Wilks proposed the following test: let L{9,x) := logf{9,x) 
be the log likelihood. For n observations, let the maximum log likelihoods over G and Hq 
be respectively 



Let W := 2{MLL(i — MLL^). Wilks found that if the hypothesis Hq is true, then the 
distribution of W converges as n — > cxd to a distribution with d — k degrees of freedom, 
not depending on the true 9 = 9o E Hq. Thus, Hq would be rejected if W is too large in 
terms of the tabulated Xd-k distribution. 

It turns out that Wilks's conclusion can be proved under the same assumptions as 
arc used to prove the lower bounds on asymptotic efficiency of estimators in Section 3.7 
and efficiency of maximum likelihood estimators in Section 3.8. It will be said that Hq is 
a /c-dimensional imbedded submanifold of G for some k < d if for each 9 G Hq, after 
a translation of coordinates taking 6* to and a suitable rotation of coordinates, Hq has a 
tangent hyperplane Kq at given by 9k+i = • • • = = 0, meaning that the intersection 
of Hq with a neighborhood V of is given by 9j — /j({^i}iLi) fo^^ j = k + 1, . . . , d, where 
fj are C*^ functions defined on an open neighborhood W of in with fj{0) = and 
V/j(0) = G R'^ for each j = + 1, . . . , d 

We have the following: 

3.9.1 Theorem (Wilks's theorem). Assume (AC-1) through (AC-5) in Section 3.7 and 
(EML-1) through (EML-5) in Section 3.8 for G where in (EML-3), are maximum 
likelihood estimators of ^ G G. Let Hq be a A;-dimensional imbedded submanifold of 
G containing 6*0 for some k < d. Let Hq be parametrized in a neighborhood of 6*0 by 
V •= {Vi}i=i the open set W C M'^ in the given definition of imbedded submanifold. 
Let Un be maximum likelihood estimators of ry in W, assumed to exist and be unique with 
probability converging to 1 as n — > oo. Assume also that C/„ ryo = in probability as 
n — > oo. 

Then as n ^ oo, the distribution of W converges to a Xd-k distribution. 

Proof. By the way, (EML-1) implies (AC-1), and (AC-2) and (AC-3) imply (EML-2). 

(AC-6) follows from (EML-1) through (EML-5) by way of Theorem 3.8.1. As in the 
definition of submanifold, we can assume by translation and rotation of coordinates that 
^0 = and Hq has the tangent hyperplane Kq at 0. Then 
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is a map of the open neighborhood V of in R*^ onto another such neighborhood W, 
with a inverse given by 

^ (</)!,... + /, ({</>jii) ■ 

Thus the map is what is called a diffeomorphism. It takes HqHV onto Kq fl U. Thus 
we can assume that Hq is the flat hyperplane Kq, replacing V by lA. 

Now, make another rotation of coordinates so that the Fisher information matrix 
I{9o) = 1(0) is diagonalized. Let its diagonal entries be ai,... ,Od, all > 0. Then by 
a linear change of parameters, replacing $j by ^yajOj, 1(0) becomes the identity matrix. 
Since Kq is still a A;-dimensional linear subspace after these transformations, by another 
rotation we can assume Hq = Kq is (as before) the hyperplane 9k+i = ■ ■ ■ = Od = 0. All 
the transformations made, and their inverses, have been with bounded first and second 
partial derivatives for their coordinates on a neighborhood of 0. Thus, by the chain rule, 
all the assumptions still hold in the new coordinates. 

Now, it's easily verified that since 6*0 G Hq, all the assumptions (AC-1) through (AC- 
5) and (EML-1) through (EML-5) imply their counterparts for KqCiU in place of G except 
that for (EML-3), consistency of the MLEs Un G Kq HU has been separately assumed. 

Let Vn := y/nTn and Wn := \/nUn G Kq. Then by Theorem 3.8.1, as n — > oo we 
have convergence in distribution 

(3.9.2) £(K) ^iVrf(0,/), C{Wn)^Nk{QJ) 

on and Kq respectively, where Nr{0,I) is the r-dimensional standard normal distribu- 
tion for r — k,d. 

A multivariate Taylor expansion oi L{9,x) around = is given by 

(3.9.3) L{e,x) = L{o,x) + s/L{o,x) ■e + ld'ndio,x)d + R{e,x) 

where 7id(0, x) is the matrix 9^L(0, a;)/9^7.9^s|r s=i and for each x, the remainder i?(^, x) = 
o(|6'p) as 6* ^ 0. Let := Ej=i ^^(O,^^)'. We have Eq^L{^,x) by assumption 
(AC-3). Thus by the central limit theorem (RAP, 9.5.6) and assumption (EML-2), the 
distribution of Snj \fn converges as n — > cxo to A^d(0, /). 

By a Taylor expansion of VoL{9,x) around ^ = we get for each x and for 9 close 
enough to 

VeL{9, x) = Ve(0, x) + HdiO, x) ■ 9 + r{9, x) 
where the remainder term r{9,x) — o{\9\) as 6* — > for x fixed. Thus 

n n n 

Substituting 9 — Tn, where the left side is 0, and dividing by \/n gives 

S 1 " 

(3.9.4) = —J2nd{0,Xj) ■ K + V^OpiTn), 



2 



if Yl^=i''^{'^n,Xj) = nop{Tn), as will be shown after the main proof is completed. By 

(AC-4), EoHdiO^x) = -7(0) = -/ in R'^\ Thus by the law of large numbers (for each 
of the matrix entries), we have (— 1/n) Yl^=i ^^^(0, X^) = / + Op(l). Since Vn = Op{l) 
and y/nOp{Tn) = Op{l) by (3.9.2), we get by (3.9.4) 

(3.9.5) Vn = Sn/V^ + Op{l). 

Analogously, define 5'^'^'* := X]"^^ V5Kfc)L(0, Xj) where V^Kfc) := {d/d$i, . . . ,d/d$k)- 

Then sit^ consists of just the first k coordinates of Sn- In the same way as in (3.9.5) we 
then get 

(3.9.6) Wn = Si^y^ + opil). 

By the definitions of MLLd and T„ and (3.9.3), if R{Tn, Xj) = nOp(|T„|2) as wiU be 

shown in (3.9.14) below, it follows that 

(3.9.7) 

n n \ ^ 

MLLd = J2LiTn,Xj) = J2L{0,Xj) + Sn-Tn + -T;5^7fd(0,X,)T„ + nOp(|T,|2). 

We have nOp(|T„|2) = Op(l) by (3.9.2). By (3.9.5), we have Sn-Tn = \Vn\'^ + Op{l) . In the 
term of (3.9.7) with Hd, we can replace T„ by Vn twice, dividing the sum by n, and then 
as in the proof of (3.9.5) we see that the term is -(1/2)|KP + Op(l). Thus (3.9.7) yields 

n 

MLLd = 5^L(0,X,) + -iVnf + Op{l). 
Proceeding in the same way for MLLk we get 

1 

MLLk = Yl ^(0' + 2 l^-l' + 

For the Wilks statistic W, applying (3.9.5) and (3.9.6), we get 

W = 2{MLLd - MLLk) = (l^nP - \Sit^ \')/n + Op{l) = | Vn + Op{l) 

where y(^-'=) is the projection of Sn onto the d — k coordinates ^fe+i, • • • ,dd- Since Snj ^/n 
converges in distribution to Nd{0,I): Y'^'^~^^ j \fn converges in distribution to Nd-hi^i^)- 
It follows that the distribution of W converges to Xd-fc- 

Proo/o/ (3.9.4) and (3.9.14). For (3.9.4) we need to show that YTj=i r{Tn,Xj) = nop{Tn). 
For each r = 1, . . . ,d, we have 

dL{e,x) dL(0,x) _ d dL{te,x) 
ddr ddr ~ Jo di dfr 
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(3.9.8) 



1 q2 



d^L(t0,x] 
dOrdds 



dt. 



Now, for each r, s = 1, . . . , d, 

d^L{te,x) 



f 

Jo 



dt 



d^L{0,x) 



\Qrs{0-,x)\ < Srs{0,x) := SUp 

I0l<|e| 



d^L{(/),x) d^L{0,x) 



dOrdOs 



where 
(3.9.9) 

It foUows by (3.9.8) that the remainder 

d 

(3.9.10) |r-(T„,X,)| < ^rs{Tn,Xj). 

r,s=l 

By (AC-5), we have ersiO,x) < 2M{x) for all x and all ^ in a small enough neighborhood 
Ui of 0. We can assume that M{x) > 1 for all x. Also, by the property of / in (AC-2) 
and (AC-1), L{6,x) is in 6, so £rs(6',3;) ^ as 6* — > 0. So, for any given £ > and for 
all X, there is a positive integer k — k{x,e) such that if 16*1 < 1/k then 



(3.9.11) 



e{9,x) := ^ ers{d,x) < e. 

r,s=l 



Since EqM < oo, by dominated convergence there is a 7 > small enough so that if 
Po(^) < 7, then J^d'^MdPo < e/2. For k > k{e) large enough, the set A oi x such that 
(3.9.11) fails has Po(^) < 7 and so Jj^MdPa < e/2. Thus Po(^) < e/2 since M > 1. 
For \9\ < l/k{s) we have s(9,x) < e (or x ^ A and < 2d^(Ml^)(a;) otherwise. By the 
strong law of large numbers, we have almost surely for n large enough, by choice of A, 
Xl^^i 2d'^{MlA){Xj) < e and then 



1 £ 2 

(3.9.12) -^£(6',Xj) < n- + - J](MU)(Xj) < 2£. 

As n — > 00, since T„ ^ in probability by (3.9.2), we have 

(3.9.13) Pi\Tn\ < l/k{e)) ^ 1. 

It follows by (3.9.10) that ^'(Tn, X^) = nOp(Tn) as desired, so (3.9.4) is proved. 

It remains to prove 



(3.9.14) 



^i?(T„,Xj) = nOp{\Tr, 
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By a form of Taylor's theorem with integral remainder we have 

R{e,x) = 0' [ {i-u)[nd{ue,x) -nd{o,x)]du-e. 

Jo 

Thus for ers{0,x) as defined in (3.9.10) and e{d,x) in (3.9.12), 

d 

\R{0,x)\ < J2 \0r\\0s\£rs{0,x) < \e\'^e{e,x). 

r,s=l 

Then by (3.9.12) and (3.9.13), (3.9.14) follows, completing the proof of the theorem. □ 

PROBLEM 

1. Let e = M^, let Pg := iV(//, /) for e M*^ and for some k < d let Hq := Kq := 
{n : Hk+i = ■ ■ ■ = = 0. Show that in this case the Wilks statistic W := 2{MLLd — 
MLLk) has exactly a Xd-k distribution for all n, not only asymptotically as n — > oo. 

NOTES 

Wilks first published his theorem in a paper, Wilks (1938), then gave an exposition 
of it in his book, Wilks (1962, §13.8). Chernofi^ (1954) gave another proof. Van der Vaart 
(1998, Chapter 16) gives a more recent exposition. The Notes by van der Vaart (1998, 
p. 240) suggest that Wilks's original proof was not rigorous. The bibliography in van der 
Vaart's book includes Wilks's 1938 paper but not the 1962 book. The proof in the 1962 
book seems rather long. 
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